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We introduce a semiparametric "tubular neighborhood" of a para- 
metric model in the multinomial setting. It consists of all multinomial 
distributions lying in a distance-based neighborhood of the paramet- 
ric model of interest. Fitting such a tubular model allows one to use 
a parametric model while treating it as an approximation to the true 
distribution. In this paper, the KuUback-Leibler distance is used to 
build the tubular region. Based on this idea one can define the dis- 
tance between the true multinomial distribution and the parametric 
model to be the index of fit. The paper develops a likelihood ratio 
test procedure for testing the magnitude of the index. A semipara- 
metric bootstrap method is implemented to better approximate the 
distribution of the LRT statistic. The approximation permits more 
accurate construction of a lower confidence limit for the model fitting 
index. 

1. Introduction. The conventional approach for assessing goodness of 
fit in multinomial models is discussed in several standard sources [see, e.g.. 
Bishop, Fienberg and Holland (1975) and Agresti (1990)]. The agreement 
between the model and the data is assessed with a goodness-of-fit test statis- 
tic, such as the Pearson chi-squared statistic, the likelihood ratio statistic 
or other measures of divergence [Read and Cressie (1988)], and the model is 
then accepted as true or rejected as false. These methods for evaluating rely 
on x^-statistics or quantities derived from them. In cases where the sample 
size n is sufficiently large, the model will usually be rejected in favor of some 
more complicated model. Goodness-of-fit statistics are usually not informa- 
tive when the sample size is very large. The question they fail to address is: 
"Is the specified model a good approximation to the true distribution of the 
sample?" 
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In 1994, Rudas, Clogg and Lindsay presented a framework based on mix- 
ture methods for evaluating goodness of fit of contingency tables. Suppose 
T is the true cell probability. For a given parametric model A4 , r can gener- 
ally be written as a two-point mixture as r = (1 — 7r)m + vre, where m is an 
element from the model M, e is an unspecified distribution corresponding 
to the modeling error, and vr is the mixture weight. The mixture index of 
fit, vr*, is defined to be the smallest such vr. That is, vr* is the fraction of the 
population that could not possibly be described by model M. The larger 
the value of vr* , the more discrepant the true distribution is from any model 
distribution. Thus one can think of vr* = vr*(r, A^) as measuring a distance 
from r to the model M. This approach focuses on measuring the difference 
between truth and model, and so on the substantive importance of the dis- 
crepancy between the model and the data. This permits an evaluation of 
the model that downplays the role of sample size. 

In this paper, we extend the results of Rudas et al. to the Kullback- 
Leibler distance, i^^(T,m) = ^ mlog[m/T]. Our motivation for choosing 
this distance is two fold. First, the vr* distance measure has some undesirable 
features. We can write 



where t is the cell index and 9 is the parameter of the model [Xi (1996), 
page 14]. The function of vr* is not everywhere differentiable, which generates 
a non-standard asymptotic theory [Xi (1996), page 78] and difficulties in 
computation [Xi and Lindsay (1996)]. On the other hand is smooth and 
generates asymptotically efficient parameter estimators, as we will show. 
A reason for choosing from among smooth distances is the simple and 
useful geometric structure that arises from K^, to be described below. 

Given any statistical distance function /9(r,m), such as K'^{t,ui), we can 
define the statistical distance from the model Ai to the true density r via 



The index p* = p{t,A4) is then a measure of the distance from the paramet- 
ric model A4 to the true distribution r. We can then build a tolerance region 
about M, which consists of all distributions r that satisfy p{t, m) < c, where 
c is some specified level of tolerance. 

In this paper we consider the problem of testing the hypothesis 




(1) 



p{t,M)= inf p{T,m). 



(2) 



Ho:K^{t,M)<c. 



That is, does the model M provide an acceptable approximation to true 
distribution t, where the word "acceptable" is measured by and indexed 
by constant c. We will test the hypothesis using the likelihood ratio test 
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statistic in Section 3.1, in common with Rudas, Clogg and Lindsay (1994). 
Our reason for using likelihood ratio test (LRT) is the well-known power of 
this procedure in detecting deviations from the null hypothesis. Of course, 
given a test statistic for each c-level, one can invert the LRT to obtain a 
lower confidence limit for K'^(t,A4). 

An interesting and very useful feature arises from using the likelihood 
ratio as a test statistic together with i^^(T,m) as the distance. We show 
in Section 3 that the maximum likelihood estimator p of the probability 
distribution under Hq has the form 

(3) p = 7rd+ (1 -7r)m^, 

where is an estimator of the model element closest to r, d is the vector 
of observed data and the weight vr G [0, 1] depends on c. That is, viewed 
geometrically, the null estimator is a convex combination mixture of the 
best model distribution and the data proportions d. This leads to a very 
simple method, essentially an iteratively reweighted maximum likelihood 
algorithm, for finding the maximum likelihood estimator 6 and hence the 
LRT statistic. Section 3 concludes by showing why the estimator 9 is both 
efficient under the model and robust under model failure. 

In Section 4 we consider two methods to set critical values for the test. 
One is based on asymptotic distribution theory and the other, more accurate, 
method is based on a semiparametric bootstrap using (3), the null hypothesis 
estimator. Our first real data example is in Section 5, where we compare our 
methods with the conventional likelihood method on two simple data sets. 
Section 6 contains the analysis of a richer multiway table, and we conclude 
in Section 7. 

2. Distance-based tubular models. Given p{t, m), a distance, and p{T,Ai) 
defined in (1), we will say that the model A4 is adequate at level c if 
p{T,Ai) < c. This is equivalent to saying that r is in the model tube Aic, 
where the model tube A^c is defined as all multinomial densities p sufficiently 
close to the model Ai: 

Mc= [j {p:p(p,m) <c}. 

This representation shows that a tube can be described as the union of balls 
around each model element. Since r G Aic ^ pi'^T^) < c, we can think of 
Mc as the extended model defined by the null hypothesis (2). 

2.1. Selecting the tolerance level c. The idea of using a tolerance zone 
around a parametric model for goodness-of-fit testing can be found in Hodges 
and Lehmann (1954), Goutis and Robert (1998) and Dette and Munk (2003), 
as well as Rudas, Clogg and Lindsay (1994). 
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One important and challenging issue with the tubular hypothesis is the 
choice of the bound c. Rudas, Clogg and Lindsay (1994) supposed the true 
distribution can be written as a two-point mixture r = (1 — 7r)m + Tre. The 
vr* index is defined as the smallest such vr. The vr* index has a natural 
mixture interpretation, so c = 0.05, say, can be thought as 5% contamination. 
However, if one uses a standard distance in the other three papers, it is very 
difficult to give a direct interpretation to c = 0.05. 

Hodges and Lehmann (1954) proposed the tolerance zone around the null 
hypothesis in a number of testing problems. They used ordinary Euclidean 
distance or a weighted Euclidean distance as the distance from a model 
element to the truth, an example, the Neyman chi-squared distance. Hodges 
and Lehmann did not give a detailed discussion on how one should choose 
c. 

Goutis and Robert (1998) proposed a Bayesian model selection approach 
using a tubular model based on the likelihood distance. The problem to 
generate a proper bound c had been considered by Mengersen and Robert 
(1996) and Dupuis (1997). In their models the distance was bounded by a 
constant. They selected c to be a fixed fraction of this bound. However, the 
choice of the fixed fraction is still an issue. 

Dette and Munk (2003) discussed the tubular hypothesis with ordinary 
Euclidean distance for nonparametric regression models. They thought that 
the choice of c was one of the "most difficult tasks." Dette and Munk consid- 
ered model selection more as an explorative data analysis and let c = 0, 0.5 
and 1 in their examples. Dette and Munk suggested that a proper analysis 
of the power would be helpful. 

Although there is no perfect way to interpret a standard distance in the 
literature, we do offer two additional assessment tools that can provide guid- 
ance in deciding on an acceptable distance. We used the Kullback-Leibler 
distance to build the tubular model. 

First, the square root of K'^{t,A4), is similar in magnitude to the vr* index 
of Rudas, Clogg and Lindsay (1994), as we found in examples of Sections 5 
and 6. This similarity helps us to interpret the distance. The index vr* is the 
proportion of the population outside of the model. So the value of root 
roughly measures the same degree of faith on the model, with vr* = (V K"^ 
close to zero) representing full use of the model estimation and vr* = 1 {V 
close to 1) representing complete discard of the model. 

Secondly, another very statistical way to think about the distance between 
two distributions F and G is to ask about the sampling consequences of the 
distance. In particular, how different are samples of size n from these two 
distributions? One could, for example, ask what the power would be if one 
were to take one sample from each and test the null hypothesis that they 
were from the same distribution. This idea is due to Davies (1995). The 
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problem with this measure is that it is dependent on the sample size n, and 
so not an absolute measure of difference. 

Lindsay and Liu (2005) proposed an interpretable measure by determining 
the sample size N* at which one would have 50% power in a test of F 
versus G, with the idea that at 50% power, the differences in the samples 
between F and G are starting to become obvious. Choosing the power 50% 
is convenient for calculation. It enables us to avoid calculating the variance 
of the asymptotic distribution [Lindsay and Liu (2005)]. We will use this 
secondary measure N* as an aid in the interpretation of radius c in our data 
sections. 

3. Likelihood ratio test for tubular model. In this section we will con- 
struct the likelihood ratio test for the hypotheses Hq :r G vs. Hi :t ^ 
A4c- The test statistic is described, and an iteratively reweighted algorithm 
is developed to find the null estimators. Interpretations of the estimators are 
discussed at the end of this section. 

3.1. Finding the test statistic. We will use the likelihood ratio test (LRT) 
for testing the null hypothesis (2). When the model is multinomial with the 
observed proportions d, the best model element in tubular model Aic is 
defined by 

Pc = arg min L^(d,p), 

P&Mc 

where is the likelihood distance, namely L^(d,p) = X]^(^)log('5^(^)/p(^))- 
The best model element is also the maximum likelihood estimator of the true 
multinomial probability density r under the null. The statistic 2nL^(d, A^c) = 
2nL^(d, Pc) equals the LRT statistic for testing the hypothesis Hq:t ^ J\Ac 

vs. Hi'.T ^ Mc- 

The problem of finding the test statistic L^(d,Pc) can be written as the 
following optimization problem: 

(4) minL^(d,p) subject to p G A^c- 

That is, we minimize the likelihood distance subject to the constraint 
A'2(p,M) <c. 

In this section, we show how the special structure of Lp' and can be 
used to turn (4) into a simple unconstrained problem. To do so, we first 
consider a simpler problem. Suppose that M contains a single density, that 
is, M = {mo}. The simpler optimization problem is then 



(5) 



minL^(d,p) subject to -fC^(p, mo) < c. 
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Theorem 1. The solution pc for the optimization problem (5) exists on 
the boundary of Mc <ind has the form 

(6) Pc = 7rd + (1 - 7r)mo, 

where tt is a unique value in [0, 1] forcing the solution pc to be on the tube 
boundary, so /^^(pc,mo) = c. 

Proof. We use the method of Lagrange multipUers for constraints 
K'^{p,nio) <c and J2p{t) = I. □ 

That is, Pc is on the line connecting mo and d and is on the boundary of 
the tube. It is interesting to note that if we replaced both and with 
the Euclidean distance, J2[d{t) — m(t)]^, we would have the same answer as 

(6) . Heuristically, the spatial curvatures of and seem to cancel each 
other out, yielding a Euclidean-like structure. 

Note that if we choose a fixed vr- value and solve the unconstrained problem 

(7) min7rL2(d,p) + (l-^)i^2(p^^^) 

p 

then the solution p^^ has the same form as (6). Observe that the constraints 
on c and vr are different in problems (5) and (7). In (5), c is a preset constant 
and TT is a free parameter that must be chosen to drag the solution to the 
tube boundary and hence is a function of c. But in (7), vr is a pre-selected 
constant but it corresponds to a value of c, via 

(8) c = K\f>^,mo), 

so c is a function of vr. Hence there is a one-to-one relationship between tt 
and c. This suggests a strategy for finding the tubal test statistic for a fixed 
c. The optimization problem (7) is easier to solve. If we solve (7) on a dense 
grid of vr values, they will generate a set of corresponding c values using (8), 
some of which would be quite close to the radius we want. 

3.2. Optimization steps and an algorithm. Suppose A4 is not a single 
element {mo}. Problem (7) suggests consideration of the extended problem 

(9) mmnL^(d,p) + (l-TT)K^(p,M). 

p 

Remarkably, once again a solution to problem (9) for fixed vr is also a 
solution Pc to problem (4) for the appropriate choice of c, namely K'^{pTr,M.) 
[Liu (2003)]. 

For models Ai = {me} with more than one element, (9) is equivalent to 

(10) minmin7rL^(d,p) + (1 - 7r)K^(p, m^). 
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Then for fixed 0, the inner minimization in (10) has the simple explicit 
solution pq = vrd + (1 — 7r)me . This ensures that the final solution is a simple 
convex combination of d and a model element m^. 

Next, to minimize over 9, the optimization problem becomes 

(11) min^L2(d,pe) + {1 - TT)K'^{pe,ine)- 

6 

This minimization problem is one in a class of minimum distance problems 
considered in Lindsay (1994). Basu and Lindsay (2003) introduced an iter- 
atively reweighted estimation function approach. We apply the results. 

The estimating equation of (11) can be written as a weighted form of 
likelihood function 

(12) Y.''{S{t))d{t)ue{t) = 0, 

where 6 is the Pearson Residual Function 5{t) = {d{t) — ■mg{t))/'mg, Uq = 
Vging/uig, and the weights oj = log[l + tt{6 + 1)/(1 — vr)]/(l + S). 
The algorithm alternates the following steps until convergence. 

• Given current estimate 6, create weight uj = log[l + 7r((5(t) + 1)/(1 — 7r)]/(l + 
5{t)). 

• Solve for the new estimate of 6 from the estimating equation (12). 

Note when c = 0, the maximum likelihood estimating equation for the 
original model is 

(13) 5^(i(t)ne(t) = 0. 

Compare (13) with the estimating equation (12) of the tubular model. We 
can think of solving (12) as obtaining the maximum likelihood estimate from 
"pseudo data" d*{t) = uj{5{t))d{t). The advantage is that for fixed uj{6{t)) 
one can easily implement standard maximum likelihood statistical software 
to solve this reweighted estimating equation. 

3.3. Interpretation of the estimates. For a second interpretation of Pt^, 
note that an equivalent objective function to (9) is L^(d, p) + ^K'^{p,M), for 
7 = (1 — 7r)/7r. It has the same solution p^r as the maximum of the penalized 
likelihood function 

R,ip)=Y,d{t)lnpit)--fK\p,M), 

where 7 > is the penalty parameter. This criterion gives an estimator p^ 
that makes the likelihood large while being penalized, for being distant from 
the model A4 in terms. 

Next, we consider interpretation of the estimators 9 = 9-^ obtained from 
(11). If we define a new distance 

t2 = 7rL2(d,7rd + (1 - 7r)me) + (1 - ^)K'^{TTd + (1 - 7r)me, mg), 
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then is the correspondmg minimum distance estimator. When vr = 0.5, 
this distance is symmetric in d and m and of some interest as a symmetric 
version of KuHback-Leibler distance. We call the symmetric distance the 
mid-tube distance, written as 

rj2(d, m) = iL2(d, d/2 + m/2) + \K^{d/2 + m/2, m). 

One can show, using the methods of Lindsay (1994), that the resulting 
estimators are, for all vr, asymptotically as efficient as maximum likelihood 
when the model is correct. Moreover, they increase in robustness as vr de- 
creases from 1 to 0, where vr = 1 is maximum likelihood. Indeed, the mid-tube 
distance is topologically equivalent to Hellinger distance, well known for its 
robustness [Liu (2003)]. 

The mid-tube distance has the advantage in case that there is a cell with 
zero T-probability and positive m-probability (or zero m and positive r), 
where some conventional statistics, for example Pearson chi-square, would 
fail. 

4. Distribution theory. To determine whether to reject the null hypoth- 
esis, we need to know the limiting distribution of the test statistic. In this 
section, we approximate the distribution of the test statistics 2nL^ by two 
methods. One is based on asymptotic distribution theory and the other 
method is based on a semiparametric bootstrap using the null hypothesis 
estimator (3). The latter one is found to be more accurate, especially for 
small values of n and c. 

4.1. Asymptotic distribution of LRT statistics. The asymptotic distribu- 
tion of our likelihood ratio statistic 2nL^ depends on c. If c = 0, we are ex- 
actly in the setting of the standard likelihood ratio test of model M against 
the unrestricted multinomial alternative. And so under model regularity, 
the likelihood ratio test statistic 2nL?' has an asymptotic distribution 
with degrees of freedom equal to the number of cells N minus the number 
of nonredundant parameters in A^, minus 1. This is the standard type of 
limiting distribution for likelihood ratio testing. 

However, if c > 0, the LRT statistic 2nL^ has, asymptotically, a mixed 
distribution, with probability 0.5 equal to Xo (point mass at zero) and with 
probability 0.5 equal to Xi- That is 

(14) 2nL2^1xg + ix?. 

This result was given for chi-squared distance in Hodges and Lehmann 
(1954) and for vr* in Rudas, Clogg and Lindsay (1994). 

If we use this asymptotic result (14), the test should reject the null hypoth- 
esis whenever the test statistic exceeds Xi(0-90) = 2.70, giving asymptotic 
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size a = 0.05. Inverting this test procedure, we can obtain a lower 95% 
confidence limit for the index p* = p{d,Ai) defined in Section 1. It equals 
p2, where the tube radius pj^ satisfies 2nL^(d, A^^* ) = 2.70. 

4.2. Simulated distribution through bootstrapping. One should be careful 
of using the asymptotic distribution in the cases that c is almost zero and the 
sample size n is not very large. In this section we offer an alternative method 
to simulate the distribution of using a semi-parametric bootstrap. 

The logic of the mixture distribution is that for c > 0, the tube Mc has 
an open interior. When d falls inside the tube, the LRT statistic is zero. For 
a tube with a smooth boundary, asymptotically its tangent hyperplane can 
be a good approximation to the tube surface. If the data falls into the model 
side of the tangent hyperplane, the likelihood deviation is zero; and it is one- 
dimensional normal otherwise, so the squared deviation is Xi- The mixture 
weight is the probability that the data are on one side of a hyperplane, which 
is clearly 0.5. 

The discontinuity in the limiting distribution at c = arises because the 
dimension of A^o does not match that of Mc for c > 0. If c is close to zero, 
A4c bas almost no interior and so the LRT statistic is hardly zero. The lim- 
iting distribution would be better described by the chi-squared distribution 
as c = 0. 

Moreover, in the multivariate case, when the sample size n is not very 
large, the difference between the tangent hyperplane and the tube surface 
caused by the curvature of the tube can be significant. The mixture weight 
0.5 is not reliable. 

In this situation, it would be desirable to construct critical values from a 
method that takes into account the finite sample failure of the asymptotic 
approximations. We propose to use a semiparametric bootstrap method, 
which does just that. 

Examples illustrating the implementation of these testing methods are in 
the next section. 

Table 1 



Cross-c 


■lassification 


of eye color and hair 


color (n 


= 592; 


Eye color 




Hair color 






Black 


Brunette 


Red 


Blonde 


Brown 


68 


119 


26 


7 


Blue 


20 


84 


17 


94 


Hazel 


15 


54 


14 


10 


Green 


5 


29 


14 


16 



10 J. LIU AND B. G. LINDSAY 



Table 2 

Cross-classification of number of children by annual income (n = 25,263) 



No. of 
children 




Annual 


income 




1 


1-2 


2-3 


3+ 





2161 


3577 


2184 


1636 


1 


2755 


5081 


2222 


1052 


2 


936 


1753 


640 


306 


3 


225 


419 


96 


38 


4+ 


39 


98 


31 


14 



5. Application to the independence model. In this section we give a 
detailed examination of the tubular model in two-way contingency tables. 
The model M will be the row-column independence model. 

5.1. Two well-known examples. The following examples were used by Di- 
aconis and Efron (1985) to show problems associated with the conventional 
statistics for evaluating the model adequacy. 

For Table 1 the LRT statistic is 146.44 on 9 degrees of freedom. The model 
would be rejected on the basis of these quantities. In 1994, Rudas, Clogg and 
Lindsay calculated the mixture index of fit vr*. The index tt* = 0.298 suggests 
that the original table is far from independence, because about 30% of the 
population is estimated to be outside the model. A lower (approximate) 95% 
bound for vr* is vr^ = 0.236. 

Table 2 has 2nL^ = 569.420 on 12 de grees of freedom. The LRT statistic 
has extremely small p- values leading to rejection. The index vr* is 0.104, 
and its lower limit is vr^ = 0.091. Given the potential for misclassification in 
either or both factors in Table 2, a misclassification rate of the order of 10% 
would not be surprising, and could explain the lack of fit. Regardless, the 
main conclusion is that the data in Table 2 are much closer to the hypothesis 
of row-column independence than are the data in Table 1. 

5.2. Kullback distance inference. We next consider tubular model 
testing for our examples. 

We solved the optimization problem (9) for a grid of Lagrange parameters 
TT chosen in (0, 1). Note that our main purpose is to find the lower confidence 
limit of the tube radius. Thus we will need to compute for the tube radius 
as a function of vr, that is c = K'^^ird + (1 — '7r)m, m). 

In Table 3, the LRT statistics 2nL^ along with c are given for various 
values of vr for the data in Table 1. Note that when c = the original model 
M is obtained. A monotone reduction in L^-values is evident for increasing 
values of tubular radius c. For any value of c> p*, the tubular model will 
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also be saturated yielding fit likelihood statistics of 0. A lower (approximate) 
95% bound for p* is = 0.101, as noted in Table 3. 

Table 4 gives analogous results for the data in Table 2. The value of 
p* is about 0.011, and ■v/^ = 0.106. The approximate 95% lower bound is 
/52 = 0.010 with -y/pj = 0.099. Comparison of the latter with the value of 
0.318 of Table 3 establishes the conclusion, just as for vr*, that the data in 
Table 2 are three times closer to the row-column independence than are the 
data in Table 1. 

Note that the values of y/p*, 0.369 and 0.106, are quite similar to those 
of vr*, which were 0.298 and 0.104 in the two tables. This similarity between 
these (and other) measures gives some assistance for the overall interpre- 
tation of the distance, as one can select a distance with the most natural 
interpretation. 

The close relationship between the index vr and the distances can effec- 
tively replace intensive grid search in vr for finding the values of vr that 
correspond to target values of c and 2nL'^. Indeed, both V K"^ and v'L^ 
behave nearly linearly in vr across along regions of tt, reflecting a nearly 
Euclidean structure arising from the two distances together. 



Table 3 

Likelihood statistics for the tubular model applied to Table 1 



77 




c 


2ni^ 


0.000 


0.000 


0.000 


146.44 


0.444 


0.155 


0.024 


48.68 


0.774 


0.278 


0.077 


8.77 


0.876 


0.318 


0.101 (=p1) 


2.71 


0.877 


0.319 


0.102 


2.67 


0.990 


0.365 


0.133 


0.02 


1.000 


0.369 


0.136 (=/5*) 


0.00 






Table 4 




Likelihood statistics for 


the tubular model appli 


ed to Table 2 


77 


^/^ 


c 




0.000 


0.000 


0.000 


569.42 


0.801 


0.085 


0.007 


22.75 


0.865 


0.092 


0.008 


10.48 


0.931 


0.099 


0.010 {=pl) 


2.74 


0.958 


0.105 


0.011 


1.02 


1.000 


0.106 


0.011 (=p*) 


0.00 
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Tube radius = 0. 1 3 Tube radius =0.00001 
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Chi-Sq DF=9 Chl-Sq C»==9 



Fig. 1. The simulated distribution of vs. theoretical asymptotic distribution. 

5.3. Bootstrap results. We here apply the bootstrap method to Table 1 
(4x4 table with n = 592 observations) and estimate the distribution of the 
LRT statistic 2nL^. Our bootstrap sample size is -B = 10^. We simulate 
for different tube radius, as c takes different values from to p* . 

Asymptotic theory says that 2nL^ has a mixture distribution as ^Xo + ^Xi 
if c > 0, and Xg if c = 0. Figure 1 gives the QQ plots that compare the 
simulated distributions with the asymptotic distributions of for different 
tubes. When c = (vr = 0), they match very well. But when c is positive, 
the bootstrap and asymptotic distributions differs. Some of the discrepancy 
might be an effect of the curved boundary of the tube surface. When c is 
positive but close to zero, the simulated and the asymptotic distributions 
strongly depart from each other, with the simulated sampling distribution 
much closer to Xg than |xo + ixi- 

We calculated the upper 95% quantile of the bootstrap distribution to 
be used as the estimated critical value for testing Hq. Figure 2 plots the 
LRT statistics and the simulated 95% critical values for different tube radii. 
When c > 0, the simulated critical value and theoretical asymptotic critical 
value are different, especially for c near zero. 

The bootstrapped likelihood ratio test will reject tube models when the 
tube radius c is smaller. In this example, we used linear interpolation and 
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0,0 0.1 0.2 0.3 

SQRT of c 

Fig. 2. The LRT statistics (o) and the simulated critical values (+) against the tube 
radius. 

found when -y/c is smaller than 0.300, the likelihood ratio test will reject Hq, 
so a lower 95% confidence limit for V K"^ is 0.300, compared to 0.318 using 
the asymptotic method. At this value of \/c = 0.300 the simulated critical 
value for 2nL^ is 4.24, not very close to 2.70. 

6. Application to loglinear models. In this section we will apply the 
tubular model to a multidimensional contingency table problem. We will 
compare our tubular index with the conventional LRT statistic for logistic 
regression models. 

6.1. The data and some results on loglinear models. The data were col- 
lected from a sample survey of 8,036 army recruits, identified by race (C), 
geographic origin (R), location of current training camp (L) and their pref- 
erence for camp location (P). The data of interest for this example is shown 
in Table 5 of Rudas (1991). Notice that the sample size is quite large, so 
one might find that some models do not fit, but still provide a high-quality 
approximation . 

We will focus on comparison of various models. The candidate models are 
those with all main effects, or some two-order or three-order interactions. 

The LRT and AIC, BIC indices, and our tubular model statistics for the 
candidate models are summarized in Table 6. Besides using the statistic 
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Table 5 

Preference of World War II Recruits for locations of training camp 



Color (C) 


Region (R) 


Location (L) 


Preference (P) 








North 


South 


Black 


North 


North 


387 


36 






South 


876 


250 




South 


North 


383 


270 






South 


381 


1712 


White 


North 


North 


955 


162 






South 


874 


510 




South 


North 


104 


176 






South 


91 


869 



yO*, we calculate the lower confidence limit by inverting the asymptotic 
critical value of the LRT statistic, with size a = 0.05. 

Except for the AIC and BIC values, which adjust for the effect of the 
number of parameters, the statistics 2nL^, p* and p\ all improve as we add 
extra terms to models. For example, when Model 2 changes to Model 3, 
the goodness-of-fit improves. To minimize the AIC or BIC index. Model 4 
would be selected as the best model among candidates. We have included 
AIC and BIC here for comparison, but note that they address the model 
selection problem from the point of view of risk, not distance. As discussed 
in Lindsay and Liu (2005), this makes it a measure of model quality that 
depends on the sample size n used in the experiment. 

6.2. Interpretation of the tubular radius. We then proceed to check the 
performance of the tubular statistics in assessing model fit. We note that the 
AIC and BIC select Model 4, which is also the most parsimonious model in 
which the lower confidence limit of the tubular index is zero. If we were to 
select a model by the rule "use the simplest model with the p\ value being 



Table 6 

2nL^ , AIC, BIC and tubular statistics for World War II Recruits data (k is the 
dimension of the parameter space) 



Model 


k 


2nL'^ 


AIC 


BIC 






1 Main effects 


4 


4211.3 


4219.3 


4247.3 


0.56 


0.55 


2 Model 1 + all 2-way interactions 


10 


78.02 


98.0 


167.9 


0.070 


0.057 


3 Model 2 + CRL 


11 


24.96 


46.9 


123.9 


0.039 


0.026 


4 Model 2 + CRL RLP 


12 


1.45 


25.5 


109.4 


0.0095 





5 Model 2 + CRL CRP RLP 


13 


0.68 


26.7 


117.6 


0.0065 





6 Model 2 + CRL CRP CLP RLP 


14 


0.67 


28.7 


126.6 


0.0064 
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zero" , this would be equivalent to finding the smallest acceptable model by 
hypothesis testing. Using a model with > would correspond to allowing 
a tolerance in the model fit. In order to make an acceptable lower limit for 
p'^, one needs a statistically meaningful interpretation. 

In Table 7 we show the tubular index p* = K'^{d,A4) for each model, and 
for companion, the mid-tube distance 7\^/2' '^hich is very similar in value 
after appropriate scaling, {p* « 4Ty^ ) . Although it is hard to give absolute 

meaning to y/p*, by comparing relative distance one can see improvement 
in fit if going from Model 1 to 2, but smaller gains thereafter. 

Also shown in Table 7 is the value of the vr* index. From this we note that 
for Model 3 only 6.5% of the population is not fit by the model. Relative 
magnitudes of Vp* are still similar to those of vr* , but absolute magnitude is 
not as close as those in Section 5, again suggesting that Model 3 described 
a much larger fraction of the population. 

In order to further interpret these distances, we use the index A^* briefly 
described in Section 2. We implemented that idea here as follows. For each 
model Ai, we used the size a = 0.05 likelihood ratio test of Hq:t S M. 
against a general alternative. Given a nominal sample size A^, we can then 
simulate samples from the empirical distribution d. The proportion of rejec- 
tions then provides us with an estimate of the power at sample size A^ under 
the estimated alternative d. We then find the sample size A^* at which the 
power is 0.5. [For more details see Lindsay and Liu (2005)] This sample size 
then represents the maximum sample size at which samples from d are hard 
to distinguish from samples from Ai. 

In Table 7 the A^* values are not reported for Models 4, 5 and 6 because 
these models were accepted in testing. That is, it is unreasonable to estimate 
the largest sample size for which the model is descriptive of the data when 
it is larger than the actual sample size. From this table we can see that the 
rejected Model 1, with root radius 0.56, is an extremely poor description 
of data samples of any size, larger than 5. Model 2, at root radius 0.070, 
describes samples of quite large size, 950. However, with small losses in 

Table 7 

Tubular statistics, tt* and the credibility index for World War II recruits data 



Model 


k 






77* 


N* 


1 


4 


0.563 


0.517 


0.715 


5 


2 


10 


0.0696 


0.0696 


0.143 


950 


3 


11 


0.0388 


0.0394 


0.065 


3000 


4 


12 


0.00945 


0.00947 





> n 


5 


13 


0.00650 


0.00651 





> n 


6 


14 


0.00642 


0.00643 





> n 
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parsimony, one could use Model 3 and gain considerable descriptive power 
over Model 2. Choice of a particular model in practice would then depend 
on the tradeoff between explanatory power and parsimony that one chooses 
to pursue. 

7. Discussions. In this paper we introduced the semiparametric toler- 
ance region of a parametric model in the multinomial setting. We focused 
on the Kullback-Leibler distance to define the statistical tube. A likelihood 
ratio test procedure was developed for testing the tubular hypothesis that 
the true distribution is in such a tubular neighborhood of the model. The 
asymptotic and the simulated distributions of the LRT statistic were inves- 
tigated and the lower confidence limit was then constructed. 

We could use other distances to define the tube, such as Hellinger distance. 
The mid-tube distance proposed in Section 3.3 is another choice, since it is 
topologically equivalent to Hellinger distance. The most important feature 
relative to tubal inference is that when one uses Hellinger distance both for 
the tube distance and for the test statistic, one gets a closed form solution 
for the single element model as in Theorem 1 [see Liu (2003)]. One can mimic 
much of the development of this paper including fast algorithms. 

Finally, the concept of tubular tolerance region could be extended to 
continuous distributions using the empirical likelihood. So far we applied 
our methods only for multinomial models. The LRT approach in the models 
can be replaced by empirical likelihood methods, enabling one to generalize 
the concept of tube to continuous distributions. This is under investigation 
by the authors. 
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